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Abstract — This article introduces an adaptive sort- 
ing algorithm that can relocate elements accurately by 
substituting their values into a function. We focus on 
building this function which is the mapping relation- 
ship between record values and their corresponding 
sorted locations essentially. The time complexity of 
this algorithm 0(n),when records distributed uni- 
formly. Additionally, similar approach can be used 
in the searching algorithm. 

Index Terms — Algorithm/protocol design and anal- 
ysis,Sorting and searching,Data Structures 

I. Introduction 

We live in a world obsessed with keeping in- 
formation, and to find it, we must keep it in some 
sensible order [1] Computers spend a considerable 
amount of their time keeping data in order [2] The 
objective of the sorting method is to rearrange the 
records so that their keys are ordered according to 
some well-defined ordering rules. [3] 

The essense of sorting is a mapping relationship 
between record values and their corresponding or- 
dered positions. A perfect sorting algorithm will 
make us accomplish our goal via just one calcu- 
lation,substituting the value of elements into the 
function and returning us their location. 

This article describes a new sorting algorithm 
which devotes to implement the mapping rela- 
tionship mentioned previously. Assuming the map- 
ping ralationship is linear,we devised two ap- 
proaches. One depends on the maximum and the 
minimum value of records, the other depends on the 
statistic property of records. Of course,the second 
one takes more time in determining the mapping 
relationship. 



To make the mapping more accurate,the second 
pass mapping on the intervals where records density 
is high are devised. 

This algorithm consists of two parts,mapping 
routine and post-mapping routine. They will both be 
discussed. 

Performance of this algorithm are also dis- 
cussed. In the condition of uniform distribution,the 
time complexity is 0{n). 

II. Preliminaries 

In following sections, we will describe our al- 
gorithm of sorting an array of elements which we 
call records. All array positions contain out-of- 
order records that are assumed to be sorted. To 
simplify matters, we assume these records are all 
real numbers of the type double. ' And more, we 
assume that all of our operations can be done in 
main memory. 

In following discussion, the number of records is 
denoted as N. The routine of sorting is considered 
as putting N records into prepared N boxes. To 
identify these boxes, they are assigned indies which 
are integers in interval [1, iV] .After the sorting rou- 
tine, records should locate in boxes ascendly. 

At the end of this section, we name the function 
that will be introduced as guessing function.lt is 
named from one of its properties is "guessing" 
the location of records. The routine of substituting 
records into guessing function is called mapping. 

'This "double" type is defined by ANSI C++ standard. 
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III. Building the guessing function 
A. Basic properties of guessing function 

The guessing function is defined following. 

Definition 1: Guessing function is such a func- 
tion whose argument is the value of a record and 
returning value is the location of this record after 
soring. 

The ideal guessing function should have following 
properties : 

• It should be a single function. 

• The function range should be values of the 
maximum and the minimum records. 

• The function domain should be [1, A^]. 

It is easily to infer that the minimum record 
should be put into the first box whereas the maxi- 
mum record should be put into the last box. Denote 



the maximum value of elements as X„ 
minimum value of elements as Xmin- 



while the 



B. two terminals approach 

Based on the idea of building the function as 
simple as possible, we assume guessing function as 
a linear function with two ternimals ,{Xmim 1) and 
{Xmaxi Af).The equation of guessing function is 



X 



n 



(1) 



Xrnax Xmin X 1 

where x is the value of a record and n is the box 
index where the record locates. 
Thus 



n = 



Xn 



Xr, 



-{N -I) + 1 



(2) 



Since the indies of boxes are integers, so we 
need to round n down. Then we obtain the simplest 
guessing function. 



Xr, 



x„ 



1 (3) 



Definition 2: Global tangent is defined as the 
tangent of guessing function of all the records. 

TV - 1 

kglobal — ~^ 

The reason why we call it "global tangent" will be 
explained later. 

Guessing function can be rewrited as 

gi[x) = [{X - Xrmn)kglobal\ + 1 (5) 



C. An alternative approach 

We also devised an alternative approach that has 
general adaption to normally distributed record. 

According to the property of Guassian distribu- 
tion, almost the entire elements lie in the symmetric 
interval (M — 3a, M + 3cr), where M is the mean 
and a is the standard deviation. [4] 

We can assume the difference between record's 
value and mean lies in the interval (— Su, 3(t) while 
their corresponding box indies lies from 1 to N. 
So we can also define kgiobai as But there is 
a difference compared with the first approach. Such 
mapping may lead to box index greater than N or 
less than l.So a round routine is needed to limit 
box index in [1, TV]. 

Since this approach needs at least two passes 
to obtain statistic information and it need to judge 
every index,it will elapse much time than the first 
one in building guessing function. But it's mapping 
may be more accurate. 

D. Hash table and guessing function 

Some one will consider our method is just like 
a hash table. But in fact they are based on different 
principles. And more,the guessing function can be 
extended to a more precisely one. 

E. More precisely mapping: guessing function II 

No matter which approach is adopted,one dis- 
advantage of previous defined guessing function is 
that records with similiar values will be mapped 
into same boxes. This is because the tangent of 
guessing function that we used is a contant. An 
improved function that uses variable tangent can 
map elements more accurately since its tangent is 
adaptive to the density of record values. For we 
are going to introduce a better function, we denote 
the function in eq|5]as Guessing Function I and the 
following function as Guessing Function 11. 

Guessing function II is based on guessing func- 
tion I.The only difference is the tangent of Guessing 
Function II is a variable. The routine on every box 
is the same as the one that performs in guessing 
function I.The distribution array,whose element is 
denoted as should be defined here. 
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Definition 3: Distribution array is such an array 
that its scale equals to N whereas the value of A[n] 
is the sum of record numbers in boxes whose indies 
are not greater than n. 

Of course,the value of array element whose index 
is less than 1 or greater than N is 0. 

Then we can infer that 

Lemma 1: the final position of elements in the 
nth box is between A[n — 1] + 1 and A[n]. 
To any record, we have 
Lemma 2: 



1 



Xr, 



< X < 



Xr, 



(6) 



global '^global 

where x is its value and n is the index of the box 
where it is mapped by guessing function I. 

Combining Lemma [2 and Lemma |2]we obtain 
that the guessing function I in this box has two 
terminals s,(jp— -!— + X„iin,A[n — 1] + 1) and 
{^L^ + xJiTMn]). 

"'global 

Specially,if this box is the first box, where n = 
1, terminals of guessing function will be {Xmin, 1) 
and {-jTj-^ + Xmim ^[1])- In the last the box,the 
terminals should be — ^"^ . ,A[N - 1] + 1) 

Then we can consider each box indepen- 
dently.Before we applying guessing function I onto 
each box,the local tangent of guessing function 
should be introduced. 

Definition 4: Local tangent of guessing function 
is defined as the tangent of the line that passes 
through point {-jr—^ — h Xmin,A[n — 1] + 1) and 

This definition is the reason why we call the 
tangent defined in eq|3 as global tangent. 
So we have 



local 



A[n] - A[n-l\-l 



N 



= kgiobai{A[n 



Xr, 



iV-1 

kglobal 

A[n - 1] - 1) 



Xr, 



(7) 



Substituting above information into eq|5]we ob- 
tain the local guessing function in a box as 



1 



kglobal 



Xr, 



h 



ocal 



1 (8) 



Considering position of elements in this box 
starts from A[n] , we obtain the guessing function 
of the entire records 



n - 1 

^global 



'^local 



g2{x) = A[n] + 

L f^nlnhal 

(9) 

where n is calculated by eqlSjand kiocai is given 
by eqQ 

We name eq|9las Guessing Function 11. 

F. The neccessarity of guessing fiinction II 

Some of our test indicate that the time elapsed by 
guessing function II is almost 5 times than the one 
of guessing funciton I. If your record distribution is 
similar to uniform distribution,guessing function I 
is enough. But if your record distribution is gathered 
in some intervals,maybe guessing function II is 
needed. 

IV. Post-mapping routines 

No matter guessing function I or guessing func- 
tion II, we can't guarantee that every box contains 
only one record. To records in a same box,we apply 
traditional sorting algorithms to sort them so that 
each box is sorted. One pass travesal will retrieve 
them out and return us a sorted array. 

V. Performance analysis 

A. Time complexity 

In uniform distribution condition,the time com- 
pelxity of our algorithm is 0{n). 

Proof: The probability of an element being 
mapped into any box is equally. We can infer 
the probability of a box contains no element is 

a)(l-^)" = (l-^) 
lim (1 ^ 



jj)^ And we have 



N- 



N' 



So the expectation of boxes which contain no 
element is e~^N. 

After the first pass mapping, N elements are 
mapped into (1 — e^^)N boxes. In these boxes, the 
expectation of element amount in these boxes is 
per box. Considering the final position of 
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every element should be limited in the box where 
it is mapped into, in the second time of mapping, the 
expectation of error interval of mapping is less than 

So N elements need totally less than 2{i-e-^} 
times of move. Considering the mapping operation 
and the operation of constructing array A[n] have 
linear time complexity, we can conclude the time 
complexity is 0{n).[5] ■ 



[3] R. Sedgewick, Algorithms in C++. Reading, Mas- 
sachusetts: Addison-Wesley, 1992. 

[4] R. Barlow, Statistics. John Wiley & Son, 1989. 

[5] A. V. Gelder, Computer Algorithms: Introduction to Design 
and Analysis. Peason Education, 2000. 



B. Space complexity 

Before mapping, the space for storing result of 
guessing function I and His proportional to N. Also 
the space for distribution array is proportional to N. 
Space for storing other variable is constant. So the 
space complexity of both guessing function I and 
II are both 0{n). 



VI. COMPARATION WITH OTHER SORTING 
ALGORITHMS 

Some tests are performed on a computer whose 
CPU is AMD Athlon 2000+ and OS is Fedora 
Core l(Linux Kernel 2.4.22-1). Testing programmes 
are executed at multiuser text mode while com- 
piled by gcc 3.3.2 without optimization. Uniformly 
distributed numbers ranging from —20000000 to 
20000000 are generated and are sorted in testing 
programmes. TableUlists the sorting time of differ- 
ent algorithms when the scale of record increases. 
FigE also illustrates the time elpased comparison 
with some other algorithms. 
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TABLE I 

Sorting time of different algorithms 



Algorithms /Scale 


2« 


2ii 


2i4 


2^' 


2aj 


Quicksort[3] 


0.000075 


0.000525 


0.005425 


0.058475 


0.600225 


Guessing function 












one pass mapping 
two points approach 


0.000025 


0.00025 


0.002575 


0.056725 


0.603525 


Guessing function 












one pass mapping 
alternative approach 


0.000025 


0.00005 


0.00275 


0.05105 


0.60855 


Guessing function 












two passes mapping 
two points approach 


0.000075 


0.0003 


0.00365 


0.0848. 


N.A. 


Guessing function 












two passes mapping 
alternative approach 


0.00005 


0.00045 


0.0043 


0.081975 


N.A. 



Fig. 1. time vs. scale to uniform distributed records 

Time elapsed vs. record scale in uniform distribution condition 
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